



MICROCOPY RESOLUTION TEST CHART NATIONAL BUREAU OF STANDARDS 1995 4



#### ANNUAL TECHNICAL REPORT

on

OPTICAL WAVEGUIDE SPATIAL FILTERS

to

F49620-19-0-00

AIR FORCE OFFICE OF SCIENTIFIC RESEARCH

bу

R. P. Kenan and C. M. Verber

May 20, 1983

Approved for public release; distribution unlimited.



BATTELLE-COLUMBUS LABORATORIES 505 King Avenue Columbus, Ohio 43201

E

83 06 20 038

DTIC FILE COPY

ADA 1 29746

#### ANNUAL TECHNICAL REPORT

on

#### OPTICAL WAVEGUIDE SPATIAL FILTERS

to

### AIR FORCE OFFICE OF SCIENTIFIC RESEARCH

bу

#### R. P. Kenan and C. M. Verber

May 20, 1983

ont picture (MARITAL LOOP Chief, Joenskes) . Take the loop chief, Joenskes . Take the loop chief, Joenskes . Take the loop chief, Joenskes . Take the loop chief . Take the loop

BATTELLE-COLUMBUS LABORATORIES 505 King Avenue Columbus, Ohio 43201

#### TABLE OF CONTENTS

|                           | Page | <u>e</u> |
|---------------------------|------|----------|
| INTRODUCTION              |      |          |
| AIMS OF THE PRESENT PHASE |      |          |
| PRESENT STATUS            |      |          |
| OTHER ACTIVITIES          |      |          |

#### APPENDIX A

DESIGN AND PERFORMANCE OF AN INTEGRATED OPTICAL DIGITAL CORRELATOR

#### APPENDIX B

INTEGRATED OPTICAL CIRCUITS FOR NUMERICAL COMPUTATION





utilizes the engagement architecture.

#### INTRODUCTION

This report summarizes project activities taking place since the date of the last annual report, June 30, 1982. That report summarized our activities directed to the development of an integrated optical spatial light modulator and to the successful demonstration of its utility in a digital optical correlator, and marked the point of redirection of the project efforts towards numerical optical processing. The specific aims of the current phase of the work are recounted below and the present status of the research is briefly described. Activities related to the dissemination of project-developed information will also be described. The brevity of this report is directly related to the realities of redirection of effort.

#### AIMS OF THE PRESENT PHASE

The present goal of the project is the demonstration of an integrated optical numerical (analog) processor for matrix-vector multiplication, utilizing a computational architecture known as an engagement processor. This architecture is described in detail in our proposal dated August 13, 1982. Briefly described, it is a variation of the systolic architecture proposed by Kung $^{(1)}$ for use in VLSI electronics. In this architecture, the data flow in a highly synchronized fashion through a region containing processing units of a simple kind, in a pulsing manner (hence the term "systolic"). With each pulse, or epoch of time, data is input to a particular processor, utilized, then passed to an adjacent processor for its use in the next time epoch. A datum is called from memory only at the boundaries of the processing region and, once called, is passed from one processing unit to the next in a regular way. This avoids the repeated references to memory that have been labeled the "Von Neumann bottleneck", and enables full utilization of the potential speed advantages of parallel processing using processing arrays. It turns out to be particularly well suited to integrated optics.

The processor proposed to be developed would multiply a  $16 \times 16$  matrix by a 16-element vector, using the methods developed earlier in the program for the correlator, that is, using arrays of electrooptically actuated Bragg gratings. The maximum speed attainable with such a device is determined by a variety of factors; for the demonstration device, however, the speed will

be limited by the electronics used to insert the data into the integrated optical circuit (IOC). The ultimate limit would be placed by the capacitance of the electrodes, about 10 pf, at about 1 Gb/sec. data rate.

There are a number of difficult problems associated with the development of this processor. First, the fabrication of a photolithographic mask containing features 2 mm long x 3.4  $\mu$ m wide in two arrays of 32 sets of 8 finger pairs (a total of 1024 lines of aspect ratio 588) is a challenge to any maskmaker. In the present case, the two arrays must be precisely aligned relative to one another, adding to the difficulty. Second, the drive electronics for such a processor are far from trivial. Indeed, it was decided to make the demonstration using matrices having constant values across a row to avoid the need to develop and assemble a fully operational, high speed electronics net. Such a development would detract from the novel, optical aspects of the program, but would not add to the utility of the demonstration. Third, the connection of the IOC elements to the external signal sources requires considerable care. Finally, our experience with the earlier program indicates that we must understand the operation of high-aspect-ratio Bragg gratings in order to be able to assess the performance of the device and to be able to indicate directions for improvements. The simple model used with the correlator is not completely adequate, so some attention will be given to finding an improved theory predicting response and crosstalk.

#### PRESENT STATUS

A photolithographic mask has been designed and sent to our maskmaker for fabrication. As was anticipated, they have experienced some difficulties with the fabrication that have led to project delays. The mask is expected to be delivered soon. It is sketched in a highly schematic form in Fig. 1.

A second mask, to be used to fabricate a layer for connection to external devices, has been ordered and delivered. It is pictured in Fig. 2. This circuit will be fabricated on a ceramic substrate with a hole opened at the rectangle indicated in the photograph. The IOC will reside in this hole, with leads bonded to the IOC and to the leadout electrodes.

The drive electronics to input data into the circuit has been designed and is presently being assembled.



Figure 1. Highly schematic drawing of the "herringbone" electrode pattern for the matrix-vector multiplier. The actual mask will contain many such segments rather than the three shown here.



Figure 2. Photograph of mask for electrical connection to IOC.

The actual IOC will be located in an opening indicated here by the central rectangle.

A number of publications relating to the optical response of Bragg gratings have been assembled and studied. There appear to be two approaches to developing a response theory. In one, an exact solution to the diffraction problem is used. The difficulty here is that the exact solution is extremely complicated, and will require careful computer implementation to be useful. In the other, a numerical approach is used from the start to solve, approximately, the diffraction problem. Both approaches will require some coding. Present inclination is to use the second approach, if the coding needed is not too extensive, because it would result in a more generally useful tool for diffraction analysis.

#### OTHER ACTIVITIES

A publication entitled "Design and Performance of an Integrated-Optical Digital Correlator" has been prepared, submitted, and accepted for publication by the IEEE Journal of Lightwave Technology. It is included as Appendix A.

A paper entitled "Integrated-Optical Circuits for Numerical Computation" was presented at the SPIE Technical Symposium East '83, Washington, D.C., April 5-7, 1983. It is included as Appendix B.

A Gordon conference on Holography and Optical Information Processing, held June 21-25, 1982, was attended by Dr. Verber, and results of the project were discussed there. As a direct result, he has been instrumental in guiding the thrust of the numerical optical computation community; has presented a paper at the International Optical Computation Conference in Boston, MA, April 6, 1983; and has participated as discussion leader in a workshop on numerical optical computation held recently in Atlanta.

#### Reference

1. H. T. Kung, "Why Systolic Architectures?", Computer 15, 37 (1982).

#### APPENDIX A

DESIGN AND PERFORMANCE OF AN INTEGRATED OPTICAL DIGITAL CORRELATOR

#### APPENDIX A

## DESIGN AND PERFORMANCE OF AN INTEGRATED OPTICAL DIGITAL CORRELATOR

C. M. Verber, R. P. Kenan and J. R. Busch

Battelle Columbus Laboratories 505 King Avenue Columbus, Ohio 43201

#### Abstract

We describe an integrated optical correlator capable of performing ordinary binary or bipolar correlations. The device consists of two SAW transducers and an electrooptic spatial light modulator in a planar Ti in-diffused LiNbO $_3$  waveguide. It is designed to correlate a 32-bit word at a 32 M-bit/sec data rate.

\*Work supported by Air Force Office of Scientific Research.

Submitted to IEEE Journal of Lightwave Technology for publication - 10/13/82.

## DESIGN AND PERFORMANCE OF AN INTEGRATED OPTICAL DIGITAL CORRELATOR

#### Introduction

In a previous publication (1) we discussed the operation of an integrated optical correlator whose active components are a programmable electrooptic integrated optical spatial light modulator (IOSLM) and a digitally modulated surface acoustic wave (SAW) transducer. The electroopticallyinduced phase grating and the SAW act upon a guided optical wave to produce a time-varying optical signal which is proportional to the cross-correlation of the digital word preprogrammed in the IOSLM with the bit-stream generated by the SAW transducer. Although this correlator produces the desired correlation signal, it suffers from a design flaw which causes the correlation signal to appear on a background of "noise" whose height is proportional to the number of "ones" in that part of the reference word (in the IOSLM) that has no overlap with the signal word (carried by the SAW). In the present paper we discuss a modified correlator in which this design flaw is eliminated and significant flexibility in usage is introduced. This new correlator design employs the same type of electrooptic IOSLM as previously discussed. However, ic employs two SAW transducers to generate acoustic-wave grating segments of two different frequencies; each frequency represents one of the two levels in the binary signal word. (We refer to these levels as the "zero" level and the "one" level, corresponding to the ordinary binary notation using 0 and 1. We are, however, free to choose whatever arithmetic we find convenient; an example of this is the bipolar choice using -1 and +1, which will be seen to be very convenient.) This two-SAW design effects an angular separation among the two desired output beams and undiffracted light which eliminates the flaw present in the earlier device.

#### Correlator Design and Operation

The operation of the correlator may be understood with reference to Figure 1 which, for simplicity, is a schematic of a 4-bit correlator. The device is fabricated on a planar Ti-indiffused  ${\rm LiNb0}_3$  waveguide. The active components are two SAW transducers operating at 459 MHz (level zero) and 875 MHz (level one) respectively, and an electrooptic IOSLM. These components

are the active elements of the correlator. The IOSLM is discussed in detail in Ref. 1. In the figure, the angles between the axes of the three transducers are greatly exaggerated. The angles are chosen so that light cannot be diffracted by the IOSLM unless it has first been diffracted by one of the SAWs. The geometry is such that light diffracted by the low-frequency (level zero) SAW enters the IOSLM at its lower Bragg angle and light diffracted by the high-frequency (level one) SAW hits the IOSLM at its upper Bragg angle. If level one (zero) is represented in the IOSLM by an electrooptic grating segment which is turned on (off) then the geometry has the following consequences:

- (i) Light diffracted by the low-frequency SAW (0) and passing through an unenergized IOSLM element (0) emerges from the interaction region in the same direction as light diffracted by the high frequency SAW (1) and an energized IOSLM element (1). We refer to this (0-0) and (1-1) direction as the "coincidence" or "+1" direction.
- (ii) In a similar fashion it can be seen that light resulting from successive (0-1) or (1-0) coincidences emerge in a second direction. We refer to this as the "anticoincidence" or "-1" direction.
- (iii) Light which, due to less than 100% SAW diffraction efficiency or due to the absence of SAW pulses, passes undeflected through the SAW region emerges from the interaction region in a direction which is different from either the coincidence or anticoincidence direction and therefore does not contribute to background noise.

The diagram in Figure 2 illustrates the angular relationships in the two-SAW correlator. The center lines of the SAWs and of the IOSLM are illustrated. It is important that a ray that addresses the device as shown should encounter the low-frequency SAW before it encounters the high-frequency SAW. This ensures that light diffracted from a particular ray by the high-frequency SAW and light diffracted from the same ray by the low-frequency SAW have a crossing point, that is, a caustic. It is a simple matter to show that the locus of such caustics is a straight line which passes through the intersection point of the two SAW center lines. We place the IOSLM along this line of caustics so that light diffracted from a particular incident ray passes through the same part of the IOSLM regardless of which SAW causes the diffraction. This arrangement fixes all of the angular relationships.

The angles indicated in Fig. 2 are not exact but are nonetheless quite precise approximations. The location of the crossing point of the two SAWs and the IOSLM is not fixed by any specific criterion but is chosen to minimize fabrication difficulties and to optimize the synchromization of the outputs of the two SAW transducers. In general, it would be necessary to optimize the synchronization of the SAWs since the two surface acoustic waves travel in distinct directions relative to the incident light beam. This results in a gradual degradation of the interleaving of the two bit-streams corresponding to the zero and one levels of the data stream. However, because of the small angles involved, this effect is very small. For the present device, the misalignment over the width of the optical beam amounts to less than 1% of the length of one bit. However, even this small effect can be minimized by choosing the location of the SAW transducers so that the two bit streams are precisely synchronized at the center of the optical beam.

To arrive at a specific correlator design within the geometric constraints discussed above it is necessary only to specify a data rate and to adapt procedures which minimize fabrication problems. A data rate of 32 Mbit/sec was chosen. This, when combined with the SAW velocity determines the size of the IOSLM elements. Photolithographic resolution limits of 1  $\mu m$  indicate that the high frequency SAW period should not be less than 4  $\mu m$ . Finally, we note that the acoustic waves undergo a diffractional spread by an amount determined by the acoustic aperture of the transducers, according to the usual formula:

$$\alpha = \Lambda/W_{a} \tag{1}$$

where  $\alpha$  is the spread angle,  $\Lambda$  the acoustic wavelength and  $W_{\alpha}$  is the acoustic aperture. It is desirable, but not necessary, to choose the ratio of the two acoustic apertures so that the two SAWs undergo the same angular spread, so that whatever diffractional degradation occurs will occur to both SAWs equally. Thus, we set

$$\Lambda_1/W_{a1} = \Lambda_2/W_{a2} \tag{2}$$

These considerations result in the device parameters displayed in Table I.

TABLE I. 32 MB/Sec CORRELATOR DESIGN

| Item, Symbol (Units)                             | Values     |
|--------------------------------------------------|------------|
| Low-Frequency Transducer                         |            |
| Frequency, f <sub>1</sub>                        | 459 MHz    |
| Period, A,                                       | 7.625 µm   |
| Aperture, W <sub>al</sub>                        | 1.0 mm     |
| # Finger Pairs                                   | 4          |
| Bragg Angle @.633 $\mu$ m, $\theta$ <sub>1</sub> | 0.0189 rad |
| High-Frequency Transducer                        |            |
| Frequency, f <sub>2</sub>                        | 875 MHz    |
| Period, A <sub>2</sub>                           | 4.0 µm     |
| Aperture, W <sub>a2</sub>                        | 0.50 mm    |
| # Finger Pairs                                   | 8          |
| Bragg Angle @.633 $\mu$ m, $\theta_2$            | 0.0359 rad |
| Electrooptic Grating                             |            |
| Period, A eo                                     | 8.413 µm   |
| Depth, d                                         | 2.0 mm     |
| No. Periods/bit                                  | 13.0       |
| Bit length                                       | 109.4 μm   |
| Bragg Angle, θ eo                                | 0.0171 rad |

#### Binary vs. Bipolar Encoding

The input to the correlator is a 32 Mbit/sec stream which is used to modulate the r.f. inputs to one or both of the SAW transducers. If only one transducer is used, then the output of the device will be an electrical signal that is proportional to the number of coincidences of "on" bits between the preset reference word and every successive 32-bit sequence in the input data stream; the other output will be proportional to the number of coincidences of an "on" bit in the input stream with an "off" bit in the preset reference word. This corresponds to ordinary binary multiplication.

If both SAWs are used, then one output is proportional to the sum of the number of coincidences of "on" bits and the number of coincidences of "off" bits; the other output will be proportional to the sum of the number of coincidences of "on" bits in the input stream with "off" bits in the reference word and the number of coincidences of "off" bits in the input stream with "on" bits in the reference word. This corresponds to a bipolar encoding, where, for example, an "on" bit represents a +1 and an "off" bit represents a -1. These output directions will be referred to as the "+1" and the "-1" outputs, respectively. The true bipolar correlation is formed by subtracting the "-1" output from the "+1" output; this may be done electronically using a differential amplifier.

When the input stream consists of a 32-bit sequence alone, that is, with <u>no</u> SAW signals on either side, then one output of the device using the binary arithmetic is a time sequence corresponding to the correlation of the input sequence with the reference word in the IOSLM, while the other output is the correlation of the input sequence with the complement of the reference word. If the bipolar arithmetic is used, i.e., both SAW streams are present, then the correlation of the input word with the reference word will be the instant-by-instant difference between the "+1" output and the "-1" output. The importance of the separation of the two bipolar outputs into two directions can now be seen. The desired outputs are proportional to the light intensity, an intrinsically positive quantity. We get around this inconvenience by collecting positive results from one direction and negative results from another (both being positive intensities), then forming the difference electronically.

When the input stream is a continuous stream of data, then at each alignment of a 32-bit portion of the input stream with the IOSLM the outputs will correspond to a comparison of the aligned sequences. With one SAW stream, one output will be proportional to the number of SAW segments aligned with an activated IOSLM segment and the other will be proportional to the number of SAW segments aligned with an unactivated IOSLM segment. A more useful comparison occurs when the bipolar scheme is used. The "-1" output now corresponds to the logical "exclusive or" (EXOR) operation between the two words; so, when the words are identical, this output will be null. This can be a very useful output for recognition of data sequences. The "+1" output is the complement of the "-1" output and therefore is maximum when the two words are identical. In noisy situations, the "+1" output will be more useful than the "-1" output for recognition.

#### Experimental Results

The correlator is fabricated upon a planar indiffused single-mode LiNbO<sub>3</sub>:Ti waveguide. The two SAW transducers and the IOSLM are fabricated in a single photolithographic step using hard-contact techniques. In the current version of the device prism couplers are used, and the source (a HeNe laser), the detector, and their associated optics are bulk-optical components. Hybrid integration of source and detector are, of course, well within current state-of-the-art.

The experimental arrangement used to exercise the correlator is shown in Figure 3. The heart of this arrangement is an HP Model 8018A 50 MHz Serial Data/PRBS Generator. The 8018A has 2 NRZ digital outputs. "A" produces positive pulses which correspond to level one in the output word and " $\overline{\rm A}$ " produces positive pulses corresponding to the zero level in the output word. Thus, the A output can be used to modulate one of the SAW transducers and the  $\overline{\rm A}$  output can be used to modulate the other.

To exercise the correlator a preset word is read out of the word generator into the data register. The data register latches and applies appropriate voltages to the 32 elements of the IOSLM. The digital word generator is then used to regenerate this word with the output now being used to control the modulation of the SAW transducers. The digital word generator also has the ability to bury the preset word in a pseudorandom bit sequence (PRBS). This feature is useful in determining the ability of the correlator to distinguish the preprogrammed word from a series of random background words.

In Figure 4 we show the calculated coincidence output for the auto-correlation of the digital word displayed in the lower oscilloscope trace. In the upper trace is the output of detector A when this word is applied via the digital word generator to the SAW transducers. As can be seen the experimental result and the calculated output function are quite similar. In Figure 5 we display the computed and observed output of detector B. The expected output is computed by summing the anticoincidences as the signal word moves across the reference word. Once again the experimental and computed outputs are quite similar. It should be noted that the instant that the coincidence output reaches its maximum the "anti-ccincidence" output is zero.

The worst case encountered when attempting to perform the recognition function is discrimination against a signal word which differs from the reference word in only one bit. If the number of "one" or "on" bits in the reference word is  $N_1$ , then the height of the autocorrelation peak is proportional to  $N_1$  when the ordinary binary arithmetic is used, and to N (the total number of bits in the word) when the bipolar arithmetic is used. In the binary case, if we change a zero to a one in the signal word, then the correlation peak is unchanged in height, while changing a one to a zero produces a peak proportional to  $N_1$ -1. In the bipolar case, either change causes reduction in peak height to N-2. It is evident, therefore, that the bipolar arithmetic is superior for recognition applications. It might be noted that the binary arithmetic gives the same correlation peak height for a word of all ones as for the reference word.

Another parameter of the correlator which is of interest is its response time. A simple test which revealed some problems in this area was to look at the autocorrelation of a simple "picket fence" test word (alternating ones and zeros) which was input into the correlator at successively higher frequencies. The ideal output for both detectors A and B is a series of triangular waves of first increasing and then decreasing amplitudes. For both detectors we would expect the minima to be zero for each of the triangular waves. As can be seen in Figure 7 the desired behavior is almost achieved for 150 nsec input. However, as the bit duration is decreased there is a significant degradation in the output. A number of factors combine to produce this degradation at high data rates: diffraction spread from the single bits of the SAWs and the IOSLM, which present optical apertures of 109 µm; the response time of the photodetector and associated electronics; geometrical effects; and less than optimal diffraction

efficiency of the IOSLM. Examination of the experimental results and analysis of the relationships among the SAW and the IOSLM grating segments suggest that the dominant effects are the reduced effective diffraction efficiency of an isolated, activated IOSLM segment and the reduced effective width of an unactivated segment surrounded by activated ones. Both of these are geometric effects. The first arises because light enters and leaves the segments along its sides as well as across its faces; so some rays do not experience the entire depth of the segment and passes through the segment and into the wrong output beam. The second effect is due to the smaller clear aperture of an unactivated segment when it is viewed from Bragg incidence because of the high aspect ratio (ratio of grating depth to segment height) involved. The result is that some of the light that should have passed through an unactivated segment encounters adjacent, activated segments and is deflected into the wrong beam. In this sense, the "picket" fence is the worst possible case since every inactivated segment (a zero) has activated neighbors. The use of equal efficiency SAW transducers and the optimization of the acoustooptic overlap integral by improved waveguide design is expected to result in significant improvement in the frequency response of the correlator.

#### Discussion

We have shown that the device described above is indeed capable of producing the correlation of one binary (two-level) data word with another at the data rate of 32 Mbits/sec. The device has the feature that either ordinary binary (0/1) or bipolar (+1/-1) arithmetic can be accommodated. Because of the encoding of the outputs in light <u>intensities</u>, negative quantities cannot be directly encoded, so the "-1" contributions to the correlations are separately calculated and subtracted after detection whenever the bipolar arithmetic is used.

The possibility of using the device in a data-recognition (data-retrieval) system was mentioned earlier. Clearly, so long as the noise level is low, a scheme using the "-1" output direction to detect a match has a signal-to-noise advantage over the use of the "+1" output because the output is null (ideally) whenever a match occurs; reliable recognition depends only on one's ability to discriminate the output from one or more segments against the low noise level. In systems where high noise levels may occur, the "+1"

output or the correlation ("+1" output minus the "-1" output) have the advantage because the high peak for identification can be discriminated against the noise. Reliable recognition now depends on the ability to discriminate a peak 32 units high against one 31 units high (32 against 30 for the correlation). Clearly, shorter words are more reliably recognized than longer ones. If long data sequences need to be recognized against noise, then some recursive scheme using shorter words and recognizing the occurrence of the desired data in segments might be useful. (2)

Finally, we wish to point out that the present device is the first example of an integrated optical systolic processor. According to Kung's terminology, (3) the device would be termed a "Design F" convolution array having local weights (the reference word), moving input variables (the SAW-encoded stream), and a "fan in" of results (the lens that spatially integrates the light emerging from the IOSLM). The IOSLM that is the heart of this device has, however, a number of other applications in analog optical numerical computation using systolic and related architectures. An example of such a device is a matrix-vector multiplier in which the matrix data and the vector data move at right angles to one another in a variation on the systolic type of architecture. (4) Investigation into processing architectures for implementing a variety of matrix operations are under way. Such integrated-optical processing systems appear to offer an attractive solution to several special-purpose computational problems where extremely high speed is required, but analog accuracy is acceptable.

#### ACKNOWLEDGMENT

The authors would like to acknowledge the dedication and craftsmanship of Mark Parmenter who fabricated the devices used in this work.

#### References

- C. M. Verber, R. P. Kenan and J. R. Busch, "Correlator Based on an Integrated Optical Spatial Light Modulator", Applied Optics, <u>20</u> 1626 (1981).
- 2. Jacques E. Ludman, H. J. Caulfield, and P. Denzil Stillwell, Jr., "Robust Optical Long-Code Processor", Optical Engineering, 21 833-836 (1982).
- 3. H. T. Kung, "Why Systolic Architectures", Computer, 15 37-46 (1982).
- 4. P. Tamura, Private Communication.

#### FIGURE CAPTIONS

- Figure 1. Schematic of the correlator showing the angular separation of the "coincidence", the "anticoincidence", and the undiffracted beams.
- Figure 2. Layout geometry for the 2-SAW correlator, showing angular relationships. The IOSLM is placed along the caustic generated by the two SAW's. Angles are given for the small-angle approximation; only the SAW and IOSLM axes are shown.
- Figure 3. Experimental arrangement used to exercise the correlator. Angular separation of the "coincidence", "anticoincidence" and zero-order beams is shown.
- Figure 4. Calculate autocorrelation of the 32 bit digital word. The experimental correlation is shown in the insert along with the input signal.
- Figure 5. Anticoincidence output corresponding to Fig. 4.
- Figure 6. Autocorrelation of the 32-bit word 001001111100000111110011000001111 showing a) the coincidence (top) and anticoincidence (bottom) outputs, b) the bipolar autocorrelation and c) the bipolar autocorrelation appearing twice amid cross correlations with a pseudorandom bit sequence.
- Figure 7. Outputs of the coincidence (upper trace) and anticoincidence (lower trace) detectors during the autocorrelation of an a) 150 nsec b) 60 nsec and c) 30 nsec picket-fence test word. Degradation of the response at high frequencies is evident.























APPENDIX B

INTEGRATED OPTICAL CIRCUITS FOR NUMERICAL COMPUTATION

#### APPENDIX B

Integrated optical circuits for numerical computation\*

C. M. Verber and R. P. Kenan

Battelle Columbus Laboratories 505 King Avenue, Columbus, Ohio 43201

#### Abstract

Recent developments in the design of integrated optical circuits for performing optical numerical computations are discussed. The use of systolic architectures for these IOC's is described and the natural marriage of IOC's with the systolic concept is discussed. Examples include optical binary correlation, polynomial evaluation, and matrix multiplication.

#### I. Introduction

There has recently been an increasing amount of interest in the application of optical techniques to the solution of a variety of computational problems. The reasons most commonly cited for this interest are the high processing speeds and the low power consumption which are potential characteristics of optical analog devices, especially if the problem and the algorithm are well chosen. We discuss here the possibilities for the use of integrated optical circuits for performing several specific numerical computations and discuss one existing device and suggest several others which are designed in keeping with the basic architectural criteria for systolic processors. In Section II we review these criteria and discuss them in relationship to integrated optics technology.

There are a number of basic integrated optic components which are available for use in computational devices. In this paper we limit ourselves to planar as opposed to channelized IOCs and rely heavily upon the use of electrooptic gratings whose properties are reviewed in Section III. As an example of an operational device we describe a 32-bit digital correlator which operates at 32 MBit/sec. This is followed by several suggestions for matrix multiplication and polynomial processors. Among the problems which are associated with these optical devices are a lack of dynamic range and severe nonlinearities. Approaches to the solution of the second of these problems are presented in Section VII.

#### II. Systolic architectures and integrated optical circuits

The approach to computer design known as systolic array architecture was developed by  $\operatorname{Kung}^1$  and others as a method of approaching the problem of VLSI computer design. The basic quidelines are:

- a. Each datum should be fetched from memory only once to avoid the "von Neumann bottleneck".
- b. Each chip should contain only a small number of different processor subunits, although these subunits may be repeated many times on each chip.
- these subunits may be repeated many times on each chip.

  c. Connections between subunits should be only to nearest neighbors to facilitate the rapid flow of data and to simplify fabrication.

We would be hard pressed to compile a better list of design guidelines for integrated optical circuits. We do not yet have available an optically addressable memory for IOCs, although some of Nishihara's² surface holograms may be adaptable for this purpose. It is therefore essential that the recourse to memory be minimized since the act of fetching data from a digital store is much slower than the rate at which the IOC is capable of using that data. Second, at this stage in the development of IOC technology, we have only a small number of operational building blocks available to us. The second guideline is therefore compatible with IOC technology, if only by default. The third guideline is, perhaps, not as important for optical as for electronic systems since it is possible to have optical carriers intersect in either planar or in channel³ configurations without causing significant crosstalk. Complex interconnection schemes can therefore be implemented without requiring a multilayer structure. However, since the progress of the data through an optical processor is controlled by the speed of light in the device and not by a digital clock, it will be necessary to pay attention to path lengths in high-speed devices to assure that proper synchronism of the data flow is maintained.

There are several obvious advantages to using integrated as opposed to bulk optical techniques for the implementation of high-speed computational algorithms. Perhaps the most important is the fact that a variety of high-speed integrated-optical modulators and switches have already been developed and that these require electrical drive signals which are several orders of magnitude less than comparable bulk components. In addition, the integrated systems tend to be more compact than conventional optical systems and lend themselves to mass production by more-or-less conventional photolithographic techniques. A major

shortcoming of the IOCs is that they are not capable of the same flexibility in handling two-dimensional computations as are the bulk devices. A hybrid approach seems to be the obvious solution to this problem.

#### III. Electrooptic grating structures

The devices to be described in the following sections rely heavily on the use of electrooptically-induced gratings. In this section we will briefly describe the generation and the operation of these gratings.

The gratings are generated via the electrooptic effect using the fringing field from a set of interdigital surface electrodes. The basic electrode structure is illustrated in Fig. 1. The electric field immediately below the electrodes is normal to the waveguide surface, and at the surface in the gap it is tangential to the waveguide surface. Both of these fields are periodic with period equal to four line widths (if the line and gap widths are the same). The amplitudes of the index variations induced by the two fields are not, however, equal because they generally invoke different electrooptic coefficients. The net effect of the electrode configuration is to produce a complicated index profile. The fields, to which the refractive index variations are proportional, have been given by Engan<sup>6</sup> in a Fourier series; for our uses, only the fundamental component is important. The presence of two fields causes the index pattern to be shifted relative to the electrode structure, that is, the maximum of the index modulation does not occur at the centers of the gaps or of the electrode lines, but is displaced somewhat.

The induced gratings can be operated at high efficiency, if desired, using low voltages. A typical result is 95% efficiency at voltages of 4-10 volts for a grating with electrode lines 2 mm long and period 8-15  $\mu$ m. The diffraction efficiency of a grating having many fingers appears to follow Kogelnik's 7 theory in form, but typically does not reach 100% efficiency. The reason for this may be the incomplete overlap of the electric field with the optical field because of the exponential decay of the former with depth into the wave-guide. Finally, we mention that the capacitance of the surface electrodes on y-cut LiNbO3 is about .5 pf/mm of finger length/finger pair, or 1 pf/finger pair for 2 mm long fingers.

Electrooptic gratings are capable of performing simple arithmetic (logic) operations on analog (binary) voltage signals. The simplest such operation is performed using the basic element pictured in Fig. 2. The diffracted light beam has intensity equal to  $\eta$  x  $I_0$ , and  $\eta$  is determined by the voltage <u>difference</u> between the two electrodes. For binary (two-level) signals, the result is the <u>exclusive OR (EXOR)</u> logic operation. For analog voltages, the result is a nonlinear function of the voltages, but for small signals, it is proportional to  $|V_1-V_2|^2$ , the square of the voltage difference. The problem of the nonlinearity in the grating response is discussed in Section VI.

To multiply two signals together, we use the "herringbone" structure shown in Fig. 3. This is essentially two grating-inducing electrode systems using slanted fingers and placed so that the output of the first is the input to the second. In the figure, the gratings have been drawn to share one electrical lead, the ground, but this is not required. The output here is the input intensity multiplied by the product of the efficiencies of the two gratings. Again, because the grating response is nonlinear in the voltages, some arrangement must be used to linearize the device.

#### IV. The IO digital correlator: a systolic processor of design F

The first IOC that we wish to discuss is an optical space-integrating correlator 8 for binary words that was developed for AFOSR. In this correlator a set of 32 electrodes of the type illustrated in Fig. 1 are arranged in a line to form a means for spatially modulating a light beam in a planar optical waveguide. This array of gratings, which we call an IOSLM (Integrated Optical Spatial Light Modulator), is activated with a pattern of bits corresponding to a reference word. In its simplest configuration, 8 a binary data stream is used to modulate a surface acoustic wave (SAW) having the same period as the electrooptic grating and traveling parallel to it, producing a traveling, spatially-modulated, acoustic grating that contains the signal to be compared to the reference word. The full width of the electrooptic grating is illuminated by a guided light beam incident at the common Bragg angle of the two gratings (the IOSLM and the SAW). Light that is undiffracted and light that is doubly diffracted pass out of the gratings region in one direction, while light that is once diffracted passes in a second direction. Integration of the output light in the first direction by a lens produces a time-dependent electrical signal, upon detection, that is proportional to the total number of coincidences of "0-0" and "1-1" bit combinations in the two data sequences. Similarly, the light output in the second direction can be integrated to yield an electrical signal that is the complement of the first. When the signal consists simply of a 32-bit word, the difference between these two signals is the correlation of the two words considered as having been bipolar-encoded—that is, the words are thought of as

being represented by +1 and -1 instead of as +1 and 0 as in ordinary binary arithmetic; this encoding is especially useful for performing recognition operations.

The correlator described above has some drawbacks, including contamination of the output beam in the "-1/-1 + 1/1" direction by residual and scattered incident light and presence of light in the "1/-1 + -1/1" direction even in the absence of any signal wave. These defects can be corrected by using two acoustic waves to carry the signal and a nonparallel geometry? This has been done 10, but will not be further discussed here.

We want to point out here that this correlator is an example of a systolic processor of "design F" as described by Kungl. This means that the weights (here, the bits of the reference word) remain fixed in place while the data (here, the bits of the signal word), move and the output is collected by "fan-in" (here, the integrating lens and detectors). The correlator design accurately mimics the most elementary way that space-integrating correlators are commonly visualized as operating, so it is not surprising that it turns out to be one of the examples of Kung's group of systolic convolution algorithms. The relevance of its systolic nature is that it illustrates an advantage of integrated-optical implementation systolic architectures, namely, simple fabrication. The device, in both its original and its improved forms, requires only one photolithographic step to fabricate. Simple fabrication is an advantage that will occur for all of the devices that will be discussed in this paper. It will not always be possible to get by with only one photolithographic step, but none will require more than two.

#### V. Matrix multiplication

It was shown above that the herringbone structure of Fig. 3 could be used to perform analog multiplication. This concept can be simply extended to compute the scalar product of two vectors as shown in Fig. 2. Here the herringbone is segmented, each segment being used to generate the product  $A_iB_i$ . The products are then summed with the lens to generate the scalar product. We shall now show how this structure and some modifications of this structure can be used to perform vector-matrix and matrix-matrix multiplication.

It is possible to compute the product of a matrix and a vector using the segmented herringbone structure along with the engagement architecture shown in Fig. 5. Voltages representing the vector components and the matrix elements are arranged in the sequence indicated in the figure and synchronously stepped through the engagement region which is simply the segmented herringbone device. The successive products are accumulated on integrating photodetectors as indicated. A schematic of an IOC for accomplishing this is shown in Fig. 6. A major problem in the practical implementation of this technology is not the fabrication of the IOC, but in the design of a suitable electronic drive circuit which neither unduly limits the speed of the optical device nor overwhelms it with the sheer bulk of the electronic hardware.

A systolic approach to matrix-matrix multiplication is shown schematically in Fig. 7. The data flow through the engagement region as indicated, each box in the engagement region being a device which performs a running sum of the products of the respective matrix components which again are flowing synchronously through the device. Note that in order to obtain proper registration of the elements of the two matrices, the components must enter the engagement region in an appropriately skewed array.

A schematic of an integrated optical circuit for implementing the algorithm of Fig. 7 is shown in Fig. 8. In this figure the herringbone structure has been disassembled. A uniform plane guided wave is incident upon  $b_{ij}$  modulator units where it has the appropriate intensity modulation impressed upon it. This information is then carried by the light through a series of beam splitters which distribute it to the appropriate  $a_{ij}$  modulators. Since the optical distribution of information is essentially instantaneous compared to the rate at which the electronic drive circuitry can shift voltages through the system, we must remove the skew from the A matrix element array to maintain proper synchronism. Once again it would appear that the major challenge in the fabrication of a complete matrix-matrix multiplier using these concepts will be in the design of high-speed, compact electronic drive circuitry.

#### VI. Pipeline processor for polynomial evaluation

Recently, Verber et al $^{11}$  proposed an optical pipeline processor for the evaluation of polynomials. Their systolic architecture for performing this important task optically was designed initially to utilize bulk optical components, but an integrated-optical implementation was also proposed. It is this design that we discuss here.

The first step in designing an optical processor for polynomial evaluation is to rewrite the polynomial in a recursive form, using synthetic division:

$$y = p(x) = a_N x^n + a_{N-1} x^{n-1} + ... + a_1 x + a_0$$
  
=  $((...(a_N x + a_{N-1}) + a_{N-2}) x + ...) x + a_1) x + a_0$ .

It is easily seen in this form that the polynomial can be evaluated recursively using a simple unit that multiplies its two inputs together and adds a constant to form one output and passes one of the inputs through to form another output, as shown in Fig. 9. Chaining N of these units together to form a pipeline will then form an evaluator for a polynomial of order N.

Implementation of this architecture as an integrated optical circuit (IOC) can be accomplished using an electrooptic grating for the multiplier and an ordinary surface grating of, say,  $As_2S_3$  for an adder. Although both elements are gratings, they have characteristics that are sufficiently different to warrant discussion.

The operation of simple electrooptic gratings has already been discussed in Section III. The relevant feature for the present section is the typical large period, usually larger than 8  $\mu m$ . This means that these gratings, in spite of their depth, are not very selective. The wavelength selectivity of a Bragg grating can be expressed in terms of the angular selectivity through

$$\Delta \lambda_{1/2} = 2n \Lambda \cos \theta_B \Delta \theta_{1/2}$$

with  $\Delta\theta_{1/2}$  being about 0.9 period/depth. In LiNbO3, a grating with period 8  $\mu m$  and depth 2 mm yields  $\Delta\lambda_{1/2}$  >1200 A. Hence, different light sources, having different wavelengths, can be used with assurance that the multipliers will operate properly.

A surface grating fabricated holographically in a suitable material like  $As_2S_3$ , in contrast to the electrooptic case, can be made to be much more wavelength selective because of the very small periods that can be achieved. At a wavelength of .83  $\mu$ m, a surface grating having a Bragg angle of 30 degrees and a depth of 1 mm has a wavelength selectivity of about 6 A. This means that light from one source can pass through the surface grating without being diffracted, while light from a second source can be efficiently diffracted by the grating, so that addition can take place without loss.

With these remarks in mind, we can consider the IOC pictured in Fig. 10. The coefficients of the polynomial are entered by modulating individual light sources, so N+1 light sources are needed. These sources are selected to have wavelengths differing by amounts sufficient to allow their light to pass all of the surface gratings save the one through which they are injected. Each source has its own collimating waveguide lens, e.g., a Luneburg lens. Each unit of the processor consists of an electrooptic multiplier followed by a surface-grating adder. The electrooptic multipliers are actuated by the voltage x, corresponding to the argument at which the polynomial is to be evaluated. Since  $a_4$  passes through all multipliers, it is multiplied by  $x^4$ ;  $a_3$  is similarly multiplied by  $x^3$ ; etc. The relatively coarse electrotrooptic gratings operate on all the light incident on them because of the close spacing of the wavelengths. In contrast, the surface gratings act only on the light of the wavelength of the source that they are injecting into the pipeline. The x-values propagate down the pipeline from left to right, following the partially assembled polynomial. Once the pipeline is filled, the processor will output one evaluation per "pulse", where a pulse is the time required for light to traverse one unit of the processor.

Finally, it should be noted that the processor described operates with light intensities which are intrinsically positive and real, and inserts the coefficients by modulation of a light source, so the a's are also positive and real. It is, however, a simple matter to stack pipelines in parallel to implement both complex a's and complex x's, including components of either sign. Since adding both signs requires doubling the number of pipelines and implementing complex numbers also doubles the number of pipelines, the fully complex system requires 16 pipelines, with associated electronics. Fig. 11 shows eight pipelines configured to implement complex x and real a, with the additional configuration to extend to complex a indicated below.

#### VII. The linearization problem

Throughout this paper, we have utilized electrooptic gratings for multiplication, and have indicated some of their advantages in this role. Here, we discuss one disadvantage of Bragg gratings, namely, their inherent nonlinearity, caused by the dependence of diffraction efficiency on voltage. The preferred approach, of course, would be to find a multiplier element having a linear voltage response. Alternative solutions are discussed below.

The efficiency of an electrooptic grating can be written, at Bragg incidence, as

$$\eta = \sin^2 \alpha V$$

where a is a constant. This nonlinear response means that some method must be found to produce a voltage from the input variable so that an increment in the input variable produces a proportional increment in n. Let x denote the input variable. Then, we need to find a voltage V(x) of the form

$$V(x) = \sin^{-1}(\sqrt{x})/\alpha$$

This can be done with digital electronics, requiring one circuit for each electrooptic grating. Alternatively, some ac signal processing could be used, but this becomes more and more complicated as the order of the polynomials increases. The simplest solution will probably be to use an analog electronic circuit to extract the square root of x, and adjust the operating voltages so that one remains in the small signal regime. In this case,

$$\eta = (\alpha V)^2 = x$$

This keeps the circuitry simple, although it leads to a loss of signal-to-noise ratio. If noise becomes a problem, as it well may in large-order polynomials, then the full arcsine function must be used.

#### Conclusions

In this paper, we have reviewed several kinds of systolic architectures that can be used in an integrated-optical circuit to perform numerical computations ranging from simple logic operations to polynomial evaluation to matrix operations. All of the devices reviewed utilize electrooptically-induced gratings in an electrooptic waveguide. There are, of course, other ways to perform some of these operations, including surface acoustic waves; and there are surely many other numerical computations that can be performed optically using integrated optics. It is hoped that this review will stimulate others to join in the search for new applications in this exciting area.

#### References

- H. T. Kung, "Why Systolic Architectures?," <u>Computer</u>, <u>15</u>, 37-46 (1982).
   Toshiaki Suhara, Hiroshi Nishihara and Jiro Koyama, "High-Efficiency Relief-Type Wave-
- guide Hologram, "Trans. IECE Japan, E61, 167-170 (1978).

  3. T. Kurokawa and S. Oikawa, "Optical Waveguide Intersections Without Light Leak,"

  Appl. Opt., 16, 1033-1037 (1977); Hirochika Nakajima, Tetsuo Horimatsu, Minoru Seino and

  Toppei Sawaki, "Crosstalk Characteristics of Ti-LinbO3 Intersecting Waveguides and Their
- Application as TE/TM Mode Splitters," IEEE J. Quantum Electr., QE-18, 771-776 (1982).
  4. Pascal Thioulouse, Alain Carenco and Robert Guglielmi, "High-Speed Modulation of an Electrooptical Directional Coupler," IEEE J. Quantum Electr., QE-17, 535-541 (1981);
  P. S. Cross and R. V. Schmidt, "A 1 Gbit/s Integrated Optical Modulator," IEEE J. Quantum Electr., QE-15, 1415-1418 (1979).
- Electr., QE-15, 1415-1416 (1979).

  5. R. V. Schmidt and H. Kogelnik, "Electro-Optically Switched Coupler with Stepped Δβ
  Reversal Using Ti-Diffused LiNbO<sub>3</sub> Waveguides," Appl. Phys. Lett., 28, 503-506 (1976).

  6. Helge Engan, "Excitation of Elastic Surface Waves by Spatial Harmonics of Interdigital Transducers," IEEE Trans. on Electron Devices, ED-16, 1014-1017 (1969).

  7. Herwig Kogelnik, "Coupled Wave Theory for Thick Hologram Gratings," Bell Syst. Tech.J.
- 48, 2909-2947 (1969).
- 8. C. M. Verber, R. P. Kenan and J. R. Busch, "An Integrated Optical Spatial Filter,"
- Opt. Comm., 34, 32-34 (1980).

  9. C. M. Verber, R. P. Kenan and J. R. Busch, "Correlator Based on an Integrated Optical Spatial Light Modulator," Appl. Opt., 20, 1626-1629 (1981).

  10. C. M. Verber, R. P. Kenan and J. R. Busch, "Design and Performance of an Integrated Optical Spatial Light Modulator," Appl. Opt., 20, 1626-1629 (1981).
- Optical Digital Correlator, "IEEE J. Lightwave Technology, (to be published, 1983).

  11. C. M. Verber, R. P. Kenan, H. J. Caulfield, Jaques E. Ludman and P. Denzil Stilwell, Jr., "Suggested Integrated Optical Implementation of Pipelined Polynomial Processors," Paper presented at the SPIE Los Angeles Technical Symposium and Technical Exhibit, January 17-21, 1983, Los Angeles, CA.
- \*Work supported in part by AFOSR through contract F49620-79-C-0044 and in part by NASA, Langley Research Center through contract NAS1-16652.



Fig. 1. The basic electrode structure for inducing electrooptic gratings, showing the electrode parameters.



Fig. 2. Schematic of the use of an induced grating for subtraction (or logical EXOR).



Fig. 3. Schematic of the "herringbone" electrode structure used for multiplication (or logical AND).



Fig. 4. Schematic of the use of a segmented herringbone to accomplish vector multiplication.



Fig. 5. Illustration of the engagement architecture for vector-matrix multiplication.



Fig. 6. Integrated-optical realization of the architecture of Fig. 5.



Fig. 7. Illustration of a systolic architecture for matrix-matrix multiplication.



Fig. 8. Schematic layout for an integratedoptical realization of the architecture of Fig. 7.



Fig. 9. The basic calculation module.  $p_{n+1}(x)$  is the partially assembled polynomial from the previous stage.  $p_n(x)$  is the output from the present stage.



Fig. 10. An integrated-optical implementation of the pipeline processor, utilizing electrooptic grating modulators and surface grating adders.



Fig. 11. Illustration of the problem-division technique. Top: divisions needed to accommodate complex, positive  $a_k$  and complex x of either sign. Bottom: further division to accommodate either sign for  $a_k$ .

# FILMED

7:83